AITopics

Country:

North America > Canada (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 22:07:01 GMT

f3e0eb8f4ae5f3afd35b5e4b6e5a2d78-Paper.pdf

interpolator, regression, response-linear achievable interpolator, (13 more...)

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.42)

Neural Information Processing SystemsAug-20-2025, 07:04:31 GMT

On the number of variables to use in principal component regression

Ji Xu, Daniel J. Hsu

Neural Information Processing Systems http://nips.cc/

apple, asymptotic risk, latexit sha1, (15 more...)

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Neural Information Processing SystemsAug-18-2025, 20:53:08 GMT

f3e0eb8f4ae5f3afd35b5e4b6e5a2d78-Paper.pdf

artificial intelligence, interpolator, machine learning, (15 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningNov-20-2023

The Minimax Risk in Histogram-Based Uniformity Testing under Missing Ball Alternatives

Kipnis, Alon

We study the problem of testing the goodness of fit of a discrete sample from many categories to the uniform distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball of radius $\epsilon$ around the uniform rate sequence for $p \leq 2$. When the number of samples $n$ and number of categories $N$ go to infinity while $\epsilon$ is small, the minimax risk $R_\epsilon^*$ in testing based on the sample's histogram (number of absent categories, singletons, collisions, ...) asymptotes to $2\Phi(-n N^{2-2/p} \epsilon^2/\sqrt{8N})$; $\Phi(x)$ is the normal CDF. This result allows the comparison of the many estimators previously proposed for this problem at the constant level, rather than at the rate of convergence of the risk or the scaling order of the sample complexity. The minimax test mostly relies on collisions in the very small sample limit but otherwise behaves like the chisquared test. Empirical studies over a range of problem parameters show that our estimate is accurate in finite samples and that the minimax test is significantly better than the chisquared test or a test that only uses collisions. Our analysis relies on the asymptotic normality of histogram ordinates, the equivalence between the minimax setting and a Bayesian setting, and the characterization of the least favorable prior by reducing a multi-dimensional optimization problem to a one-dimensional problem.

artificial intelligence, minimax risk, sequence, (16 more...)

2305.18111

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Nakada, Ryumei, Imaizumi, Masaaki

Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks

arXiv.org Machine LearningMar-15-2021

We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of large-scale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid for linear-in-feature models, such as generalized linear regression, kernel regression, and shallow neural networks. Hence, it is difficult to investigate a wider class of nonlinear models, including deep neural networks with three or more layers. In this study, we consider a likelihood maximization problem without the model constraints and analyze the upper bound of an asymptotic risk of an estimator with penalization. Technically, we combine a property of the Fisher information matrix with an extended Marchenko-Pastur law and associate the combination with empirical process techniques. The derived bound is general, as it describes both the double descent and the regularized risk curves, depending on the penalization. Our results are valid without the linear-in-feature constraints on models and allow us to derive the general spectral distributions of a Fisher information matrix from the likelihood. We demonstrate that several explicit models, such as parallel deep neural networks, ensemble learning, and residual networks, are in agreement with our theory. This result indicates that even large and deep models have a small asymptotic risk if they exhibit a specific structure, such as divisibility. To verify this finding, we conduct a real-data experiment with parallel deep neural networks. Our results expand the applicability of the asymptotic risk analysis, and may also contribute to the understanding and application of deep learning.

deep neural network, matrix, neural network, (14 more...)

2103.005

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Tanaka, Akinori, Sannai, Akiyoshi, Kobayashi, Ken, Hamada, Naoki

Asymptotic Risk of Bezier Simplex Fitting

arXiv.org Machine LearningJun-17-2019

The Bezier simplex fitting is a novel data modeling technique which exploits geometric structures of data to approximate the Pareto front of multi-objective optimization problems. There are two fitting methods based on different sampling strategies. The inductive skeleton fitting employs a stratified subsampling from each skeleton of a simplex, whereas the all-at-once fitting uses a non-stratified sampling which treats a simplex as a whole. In this paper, we analyze the asymptotic risks of those B\'ezier simplex fitting methods and derive the optimal subsample ratio for the inductive skeleton fitting. It is shown that the inductive skeleton fitting with the optimal ratio has a smaller risk when the degree of a Bezier simplex is less than three. Those results are verified numerically under small to moderate sample sizes. In addition, we provide two complementary applications of our theory: a generalized location problem and a multi-objective hyper-parameter tuning of the group lasso. The former can be represented by a Bezier simplex of degree two where the inductive skeleton fitting outperforms. The latter can be represented by a Bezier simplex of degree three where the all-at-once fitting gets an advantage.

artificial intelligence, machine learning, optimization problem, (14 more...)

1906.06924

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Hampden County > Springfield (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-3-2019

How many variables should be entered in a principal component regression equation?

Xu, Ji, Hsu, Daniel

We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides with the principal component regression estimator; when $p>n$, the estimator is the least $\ell_2$ norm solution over the selected features. We give an average-case analysis of the out-of-sample prediction error as $p,n,N \to \infty$ with $p/N \to \alpha$ and $n/N \to \beta$, for some constants $\alpha \in [0,1]$ and $\beta \in (0,1)$. In this average-case setting, the prediction error exhibits a `double descent' shape as a function of $p$.

artificial intelligence, eigenvalue, machine learning, (16 more...)

1906.01139

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Hastie, Trevor, Montanari, Andrea, Rosset, Saharon, Tibshirani, Ryan J.

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

arXiv.org Machine LearningApr-2-2019

Modern deep learning models involve a huge number of parameters. In nearly all applications of these models, current practice suggests that we should design the network to be sufficiently complex so that the model (as trained, typically, by gradient descent) interpolates the data, i.e., achieves zero training error. Indeed, in a thought-provoking experiment, Zhang et al. (2016) showed that state-of-the-art deep neural network architectures can be trained to interpolate the data even when the actual labels are replaced by entirely random ones. Despite their enormous complexity, deep neural networks are frequently seen to generalize well, in meaningful practical problems. At first sight, this seems to defy conventional statistical wisdom: interpolation (vanishing training error) is usually taken to be a proxy for overfitting or poor generalization (large gap between training and test error). In an insightful series of papers, Belkin et al. (2018b,c,a) pointed out that these concepts are, in general, distinct, and interpolation does not contradict generalization. For example, kernel ridge regression is a relatively well-understood setting in which interpolation can coexist with good generalization (Liang and Rakhlin, 2018). In this paper, we examine the prediction risk of minimum l norm or "ridgeless" least squares regression, under

artificial intelligence, deep learning, machine learning, (17 more...)

1903.0856

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)